\chapter{Introduction}


\section{Aims of the project}
This project forms part of a larger three years project funded by the RCUK Digital Economy Programme called PATINA (Personal Architectonics of Interactions with Artefacts) \cite{1}.
This wider consortium will provide researchers with new opportunities to create research spaces that emphasise the primacy of research material.

The focus of this specific project is to explore the design of a portable camera-projector system, aimed at the identification of research artefacts, specifically books and other objects with printed text, enhancing them with related projected information through a human-gesture-based user interface.

Recognition of text in scanned images has been the subject of extensive  research. There has been instead less consideration about text location and extraction from general indoor and outdoor environments. Past systems, in both cases, performed their work exploiting standard 2D cameras and without any form of interaction. On the contrary, our approach will exploit the enormous potential offered by depth cameras. In particular we will use the depth information to assist us in the task of identifying text information present on hand held objects and to develop an arm pointer mechanism that allows the user to select surfaces using simple pointing gestures. These two methods allow to select an area in the whole image, making the text-tracking   interactive and reducing therefore the resolution demands.

Another goal we aim to achieve is to create a device that will interact with the user without the need of requiring the user to wear any special equipment (e.g. coloured markers).

Central to this thesis is also the task of investigating the limits of the current technology, focusing in particular on the analyses of a possible joint employment of depth cameras and small sized computer units.

\section{Usefulness}
The aims of this project are interesting and challenging from different points of view and in particular because we will explore new technologies  in contexts and ways for which they were not originally intended. 
The aim of the device is to help the researcher in his research process, enhancing the research objects by projecting related information back into their research space. It can be employed in several different research areas and contexts such as libraries, museums,  archaeological fieldwork sites, translation tasks  and also as an aid for visually handicapped people (using text To Speech Synthesizers).

The information retrieved will depend on the particular application and can consist of general web information about the object (e.g. Google Books ), shared information from different researchers (e.g. annotations) or both of them.
With an extension of the project, the arm-pointer could be used to select different research objects and retrieve related information about links among these different objects. 

Another possible application could be to recognize by means of the depth camera all the research objects, also those without text data.

Using our gesture controlled user interface will allow to interact directly with the user research object without having anymore the inconvenience of using additional input devices such as keyboards or mice.
The portability furthermore allows to use the device practically everywhere and to perform the research process directly in the field.

\section{Limitations}

Being part of a three year project but having at my disposal only a limited time, this thesis aims at creating the bases for more advanced future work, leaving to it in particular the task to build physically the portable device and to implement all the features related to the particular context of implementation.  

Current depth cameras produce depth data only starting from a certain minimum distance so this is a problem if you want to build a wearable device capable of tracking gestures performed by his user. This forces us to consider only scenarios where the user is within the depth camera range. 

Unfortunately, when using only one device, occlusions (e.g. when one part of the body hides the text data or another part of the body) can occur so the user has to pay attention to the camera position. A possible extension of this work could be to test more depth cameras to avoid the occlusion problem.

Another problem that we will face is due to the novelty of the technologies analysed. With these technologies, new drivers, frameworks updates and fixes are very frequent. This entails a lot of additional work because it's highly probable to have to implement features that would be released after some time with new drivers and API updates.

\section{Structure of Thesis}
%TODO: Mettere "Chapter X" in grassetto così: \textbf{Chapter X}
The structure of this thesis is organized as follows: \\ \\
Chapter 2 covers the necessary technical background for this thesis and describes the prior work in related areas, providing the necessary basis for understanding the decisions that will be taken in the next chapters. \\ \\
Chapter 3 describes the hardware and software architecture of the system. \\ \\
Chapter 4 discusses the research we made about the technologies we considered worthy of investigation. \\ \\
Chapter 5 covers in detail the software implementation. \\ \\
Chapter 6 provides the results of the tests we carried out on our prototype, showing how the system performs and what it is capable of. \\ \\
Chapter 7 describes the conclusions drawn from the study carried out and discusses possible future work and extensions to this project.